Fast Feature Selection Using Fractal Dimension
نویسندگان
چکیده
Dimensionality curse and dimensionality reduction are two issues that have retained high interest for data mining, machine learning, multimedia indexing, and clustering. We present a fast, scalable algorithm to quickly select the most important attributes (dimensions) for a given set of n-dimensional vectors. In contrast to older methods, our method has the following desirable properties: (a) it does not do rotation of attributes, thus leading to easy interpretation of the resulting attributes; (b) it can spot attributes that have nonlinear correlations; (c) it requires a constant number of passes over the dataset; (d) it gives a good estimate on how many attributes we should keep. The idea is to use the ‘fractal’ dimension of a dataset as a good approximation of its intrinsic dimension, and to drop attributes that do not affect it. We applied our method on real and synthetic datasets, where it gave fast and good results. 1 Introduction and Motivation When managing the increasing volume of data which is generated by the organizations, a question which frequently arises is: “what part of this data is really relevant to be kept?”. Notice that usually the relations of the database have many attributes which are correlated with the others. Attribute selection is a classic goal, as well as battling the “dimensionality curse” [Berchtold_1998] [Pagel_2000]. A careful chosen subset of attributes improves the performance and efficacy of a variety of algorithms. This is particularly true with redundant data, as many datasets can largely be well-approximated in fewer dimensions. This can also be seen as a way to compress data, as only the attributes which maintain the essential characteristics of the data are kept [Fayyad_1998]. In this paper we introduce a novel technique that can discover how many attributes are significant to characterize a dataset. We also present a fast, scalable algorithm to quickly select the most significant attributes of a dataset. In contrast to other methods, such as Singular Value Decomposition (SVD) [Faloutsos_1996], our method has the following desirable properties: (a) it does not rotate attributes, leading to easy interpretation of the resulting attributes; (b) it can spot attributes that have nonlinear and even non-polynomial correlations; (c) it is linear on the number of objects in the dataset;
منابع مشابه
An improved algorithm for feature selection using fractal dimension
Dimensionality reduction is an important issue in data mining and machine learning. Traina[1] proposed a feature selection algorithm to select the most important attributes for a given set of n-dimensional vectors based on correlation fractal dimension. The author used a kind of multi-dimensional “quad-tree” structure to compute the fractal dimension. Inspired by his work, we propose a new and ...
متن کاملAn Adaptive Segmentation Method Using Fractal Dimension and Wavelet Transform
In analyzing a signal, especially a non-stationary signal, it is often necessary the desired signal to be segmented into small epochs. Segmentation can be performed by splitting the signal at time instances where signal amplitude or frequency change. In this paper, the signal is initially decomposed into signals with different frequency bands using wavelet transform. Then, fractal dimension of ...
متن کاملAn Adaptive Segmentation Method Using Fractal Dimension and Wavelet Transform
In analyzing a signal, especially a non-stationary signal, it is often necessary the desired signal to be segmented into small epochs. Segmentation can be performed by splitting the signal at time instances where signal amplitude or frequency change. In this paper, the signal is initially decomposed into signals with different frequency bands using wavelet transform. Then, fractal dimension of ...
متن کاملAdaptive Segmentation with Optimal Window Length Scheme using Fractal Dimension and Wavelet Transform
In many signal processing applications, such as EEG analysis, the non-stationary signal is often required to be segmented into small epochs. This is accomplished by drawing the boundaries of signal at time instances where its statistical characteristics, such as amplitude and/or frequency, change. In the proposed method, the original signal is initially decomposed into signals with different fr...
متن کاملارائه یک روش برچسب گذاری سیگنالهای مغزی بهمنظور طبقهبندی حالتهای مختلف بیهوشی
Aims and background: This study develops a computational framework for the classification of different anesthesia states, including awake, moderate anesthesia, and general anesthesia, using electroencephalography (EEG) signals and peripheral parameters. Materials and Methods: The proposed method proposes ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000